{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8eaafa26",
   "metadata": {},
   "source": [
    "## Description：\n",
    "这个笔记本主要是完成数据集的简单处理与采样工作，全量数据有8个多G的数据，数据量太大，不利于直接使用， 所以想从里面采样出高质量的一些用户数据来做后面模型相关的实验。采样方式如下:\n",
    "1. 采样前先处理点击日志这个大数据集， 这里的处理主要是删除记录\n",
    "    1. 先根据时间筛选， 一个12天的数据，这里选择出后7天的拿来用， 具体实现方法： 时间戳转成时间，然后筛选即可\n",
    "    2. 删除历史记录中的文章不在文章池子里的记录，这些记录即使保留着， 也没有文章画像， 不利于后面的实验， 实现方法， 拼接上文章画像表，然后把文章画像那部分为空记录删除掉，尤其是看不到类别或者发布时间的这种\n",
    "    3. 删除不合法的点击记录， 即文章的上传时间大于曝光时间的，相当于先曝光，再上传，这种不合法，实现方法：根据时间筛选即可\n",
    "    4. 删除没有历史点击的用户，即曝光但点击为0的这些用户， 方法: click字段筛选\n",
    "    5. 剩下的记录， 把观看时间太短的用户删除掉， 只保留3s以上的视频， 小于3s的默认是误点\n",
    "    6. 简单看下序列长度分布， 序列过长的用户也剔除掉\n",
    "2. 处理好的日志数据集，从里面采样出2w用户， 由于调试YouTubeDNN， 先快速跑出来， 后面这个可以再加\n",
    "\n",
    "读入数据的时候， 不能直接读入， 8个多G内存会爆掉，所以需要分块读入，比如一次读10000， 然后这10000条数据，按照上面的四步处理删除，把剩下的记录保存到一个pandas中。<br><br>\n",
    "\n",
    "下一次运行的时候，就可以直接从处理好的数据里面直接采样即可。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "ca6e065a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import time\n",
    "import random\n",
    "from datetime import datetime\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "from tqdm import tqdm\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b572f45a",
   "metadata": {},
   "source": [
    "## 导入日志数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ffc4f3c1",
   "metadata": {},
   "outputs": [],
   "source": [
    "base_path = '全量数据集/off_data'\n",
    "doc_info_path = os.path.join(base_path, 'doc_info.txt')\n",
    "train_data_path = os.path.join(base_path, 'train_data.txt')\n",
    "save_path = 'all_data/off_data.csv'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "c9053784",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 读入doc_info\n",
    "doc_info = pd.read_csv(doc_info_path, delimiter='\\t', names=['article_id', 'title','ctime', 'img_num', 'cat1', 'cat2', 'key_words'])\n",
    "article_ids = set(doc_info['article_id'])\n",
    "\n",
    "doc_info['ctime'] = doc_info['ctime'].str.replace('Android', '1625400960000')\n",
    "doc_info['ctime'].fillna('1625400960000', inplace=True)\n",
    "doc_info['ctime'] = doc_info['ctime'].apply(lambda x: datetime.fromtimestamp(float(x)/1000) \\\n",
    "                                                        .strftime('%Y-%m-%d %H:%M:%S'))\n",
    "doc_info['ctime'] = pd.to_datetime(doc_info['ctime'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "918020cf",
   "metadata": {},
   "source": [
    "## 过滤原始日志数据并写回到文件"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "8ae8c3c6",
   "metadata": {},
   "outputs": [],
   "source": [
    "def filter_record(chunk, article_ids, doc_info, start_time='2021-06-30 00:00:00'):\n",
    "    \n",
    "    # 根据时间筛选， 只用后七天的数据\n",
    "    chunk['expo_time'] = chunk['expo_time'].apply(lambda x: datetime.fromtimestamp(x/1000) \\\n",
    "                                                        .strftime('%Y-%m-%d %H:%M:%S'))\n",
    "    chunk['expo_time'] = pd.to_datetime(chunk['expo_time'])\n",
    "    chunk = chunk[chunk['expo_time'] >= start_time]\n",
    "    \n",
    "    # 去掉点击的文章不在总doc里面的记录\n",
    "    chunk = chunk[chunk['article_id'].isin(article_ids)]\n",
    "    \n",
    "    # 拼接上doc的ctime， 然后去掉曝光时间小于上传时间的\n",
    "    chunk = chunk.merge(doc_info[['article_id', 'ctime']], on='article_id', how='left')\n",
    "    chunk = chunk[chunk['expo_time'] > chunk['ctime']]\n",
    "    del chunk['ctime']\n",
    "    \n",
    "    # 标签\n",
    "    chunk = chunk[chunk['click'].isin([0, 1])]\n",
    "    \n",
    "    # duration\n",
    "    chunk = chunk[~((chunk['click']==1) & (chunk['duration']<3))]\n",
    "    \n",
    "    return chunk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "7bc1537e",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "1898it [33:07,  1.05s/it]\n"
     ]
    }
   ],
   "source": [
    "# 分块读入train_data，并处理每一块\n",
    "names = ['user_id', 'article_id', 'expo_time', 'net_status', 'flush_nums', 'exop_position', 'click', 'duration']\n",
    "train_data_reader = pd.read_csv(train_data_path, delimiter='\\t', chunksize=100000, iterator=True, names=names)\n",
    "count = 0\n",
    "for chunk in tqdm(train_data_reader):\n",
    "    count += 1\n",
    "    # print('过滤前形状: ', chunk.shape)\n",
    "    chunk = filter_record(chunk, article_ids, doc_info)\n",
    "    # print('过滤后形状: ', chunk.shape)\n",
    "    if count == 1:\n",
    "        chunk.to_csv(save_path,index = False)\n",
    "    else:\n",
    "        chunk.to_csv(save_path,index = False, mode = 'a',header = False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06876eee",
   "metadata": {},
   "source": [
    "## 读取新日志数据并采样"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "04b5514c",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_train_data = pd.read_csv('all_data/off_data.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "83f64a49",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(63241712, 8)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_train_data.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "8061c69e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>article_id</th>\n",
       "      <th>expo_time</th>\n",
       "      <th>net_status</th>\n",
       "      <th>flush_nums</th>\n",
       "      <th>exop_position</th>\n",
       "      <th>click</th>\n",
       "      <th>duration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1000014754</td>\n",
       "      <td>465426190</td>\n",
       "      <td>2021-07-04 15:07:01</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1000014754</td>\n",
       "      <td>465815972</td>\n",
       "      <td>2021-07-04 15:07:01</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>285</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1000014754</td>\n",
       "      <td>465991958</td>\n",
       "      <td>2021-07-04 15:07:01</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>353</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1000014754</td>\n",
       "      <td>464264603</td>\n",
       "      <td>2021-06-30 08:08:14</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>18</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1000014754</td>\n",
       "      <td>464140836</td>\n",
       "      <td>2021-06-30 07:35:31</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      user_id  article_id            expo_time  net_status  flush_nums  \\\n",
       "0  1000014754   465426190  2021-07-04 15:07:01           5           0   \n",
       "1  1000014754   465815972  2021-07-04 15:07:01           5           0   \n",
       "2  1000014754   465991958  2021-07-04 15:07:01           5           0   \n",
       "3  1000014754   464264603  2021-06-30 08:08:14           5           1   \n",
       "4  1000014754   464140836  2021-06-30 07:35:31           5           0   \n",
       "\n",
       "   exop_position  click  duration  \n",
       "0              5      0         0  \n",
       "1              4      1       285  \n",
       "2              0      1       353  \n",
       "3             18      0         0  \n",
       "4             13      0         0  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_train_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "62a392f8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1039309"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 采样之前， 把过短用户的序列删除掉\n",
    "new_train_data['user_id'].nunique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "d4d8c93f",
   "metadata": {},
   "outputs": [],
   "source": [
    "click_df = new_train_data[new_train_data['click']==1]\n",
    "user_click_count = click_df.groupby('user_id')['click'].apply(lambda x: x.count()).reset_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "20f4e992",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>click</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>6.430580e+05</td>\n",
       "      <td>643058.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>1.937426e+09</td>\n",
       "      <td>13.758264</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>5.209519e+08</td>\n",
       "      <td>21.891624</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1.734000e+04</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>1.497289e+09</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>2.213135e+09</td>\n",
       "      <td>5.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>2.402583e+09</td>\n",
       "      <td>16.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>2.447274e+09</td>\n",
       "      <td>1563.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            user_id          click\n",
       "count  6.430580e+05  643058.000000\n",
       "mean   1.937426e+09      13.758264\n",
       "std    5.209519e+08      21.891624\n",
       "min    1.734000e+04       1.000000\n",
       "25%    1.497289e+09       2.000000\n",
       "50%    2.213135e+09       5.000000\n",
       "75%    2.402583e+09      16.000000\n",
       "max    2.447274e+09    1563.000000"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_click_count.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "65dab0ff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>click</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>17340</td>\n",
       "      <td>46</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>394666</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>450280</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>456646</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>489240</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id  click\n",
       "0    17340     46\n",
       "1   394666      3\n",
       "2   450280      3\n",
       "3   456646      1\n",
       "4   489240     18"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_click_count.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "98d6b555",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 保留历史行为序列大于等于10的数据  且小于等于100的用户保留下来\n",
    "stay_users = set(user_click_count[(user_click_count['click'] >= 10) & (user_click_count['click'] <= 100)]['user_id'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "f559ffdb",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_train_data = new_train_data[new_train_data['user_id'].isin(stay_users)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "2b1abbea",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(43928411, 8)"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_train_data.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2db521ae",
   "metadata": {},
   "source": [
    "## 数据保存"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "503ae98d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 先保存一份处理好的数据, 后面要是再采样， 可以用这份数据采\n",
    "new_train_data.to_csv('all_data/processed_data.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "aad2f51b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 按照用户id采样\n",
    "user_num = 20000\n",
    "user_ids = set(new_train_data['user_id'])\n",
    "sample_users = random.sample(user_ids, user_num)\n",
    "\n",
    "sample_train_data = new_train_data[new_train_data['user_id'].isin(sample_users)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "573fd873",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(3939989, 8)"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sample_train_data.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "dc17bcb3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>article_id</th>\n",
       "      <th>expo_time</th>\n",
       "      <th>net_status</th>\n",
       "      <th>flush_nums</th>\n",
       "      <th>exop_position</th>\n",
       "      <th>click</th>\n",
       "      <th>duration</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1659</th>\n",
       "      <td>1000541010</td>\n",
       "      <td>464467760</td>\n",
       "      <td>2021-06-30 09:57:14</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>13</td>\n",
       "      <td>1</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1660</th>\n",
       "      <td>1000541010</td>\n",
       "      <td>463850913</td>\n",
       "      <td>2021-06-30 09:57:14</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>15</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1661</th>\n",
       "      <td>1000541010</td>\n",
       "      <td>464022440</td>\n",
       "      <td>2021-06-30 09:57:14</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>17</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1662</th>\n",
       "      <td>1000541010</td>\n",
       "      <td>464586545</td>\n",
       "      <td>2021-06-30 09:58:31</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>20</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1663</th>\n",
       "      <td>1000541010</td>\n",
       "      <td>465352885</td>\n",
       "      <td>2021-07-03 18:13:03</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>18</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         user_id  article_id            expo_time  net_status  flush_nums  \\\n",
       "1659  1000541010   464467760  2021-06-30 09:57:14           2           0   \n",
       "1660  1000541010   463850913  2021-06-30 09:57:14           2           0   \n",
       "1661  1000541010   464022440  2021-06-30 09:57:14           2           0   \n",
       "1662  1000541010   464586545  2021-06-30 09:58:31           2           1   \n",
       "1663  1000541010   465352885  2021-07-03 18:13:03           5           0   \n",
       "\n",
       "      exop_position  click  duration  \n",
       "1659             13      1        28  \n",
       "1660             15      0         0  \n",
       "1661             17      0         0  \n",
       "1662             20      0         0  \n",
       "1663             18      0         0  "
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sample_train_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "c57e8f69",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 保存\n",
    "sample_train_data.to_csv('all_data/sample_2w_data.csv', index=False)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
